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Abstract 

We present two approaches for next step linear prediction of long memory time series. The 
first is based on the truncation of the Wiener-Kolmogorov predictor by restricting the observa- 
tions to the last k terms, which are the only available values in practice. Part of the mean squared 
prediction error comes from the truncation, and another part comes from the parametric esti- 
mation of the parameters of the predictor. By contrast, the second approach is non-parametric. 
An AR(A:) model is fitted to the long memory time series and we study the error made with this 
misspecified model. 
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ARMA (autoregressive moving-average) processes are often called short-memory processes be- 
cause their covariances decay rapidly (i.e. their covariance decay exponentially). By contrast, a 
long-memory process is characterised by the following feature: the autocovariance function a decays 
more slowly i.e. it is not absolutely summable. They are so-named because of the strong associ- 
ation between observations widely separated in time. The long-memory time series models have 
attracted much attention lately and there is now a growing realisation that time series possessing 
long-memory characteristics arise in subject areas as diver se as Economics, Geophysics, Hyd rology 
or telecom traffic (see, e.g., [Mandelbrot and WallisI (|l969l ) and ICranger and .Toveuxl dlQSd'l'). Al- 



Bhansah 



thoug h there exists substantial li terature on the prediction of short-memory processes(see 

( 1978 ) for the univariate case or Lewis and Reinse] ( 19851 ) for the multivariate case), there are less 
results for long-memory time series. In this paper, we consider the question of the prediction of the 
latter. 

More precisely, we will compare two prediction methods for long-memory process. Our goal 
is a linear predictor X^^i from observed values which is optimal in the sense that it minimizes 



the mean-squared error E 



X, 



k+l 



X 



k+l 



This paper is organized as follows. First we will 



introduce our model and our main assumptions. Th en in sect i on [H w e study the best linear predictor 
i .e. th e Wiener-Kolmogorov predictor proposed by IWhittld (j 19631 ) and by iBhansali and Kokoszka 
(|200ll ) for the long-memory time series. In practice, only the last k values of the process are 



available. Therefore we need to truncate the infinite series which defines the predictor and derive 
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the asymptotic behaviour as A; — > +00 of the mean-squared error. Then we propose an estimator of 
the coefficients of the infinite autoregressive representation based on a reahsation of length T. Under 
the simphfying assumption that the series used for estimation and the series used for prediction are 
generated from two independent process which have the same stochastic structure, we obtain an 
approximation of the mean-squared prediction error when T — > +00 and then k — > +00. 

In Section [3l we discuss the asymptotic properties of the forecast error if we fit a in i sspec ified 
AR(A;) model to a long-memory time series. This approach has been proposed by Rav ( 19931 ) for 
fractional noise series F{d). His simulations show that high-order AR m-models forecast fractional 
integrated noise very well. In that case we also study the consequences of the e stimation o f the 
forecast coefficients. Therefore we shall rewrite the heuristic proof of Theorem 1 of Ray ( 19931 ) and 
develop a generalization of this result to a larger class of long-memory models. We conclude by 
comparing our asympto tic a pproximat i on for the global prediction error of long-memory processes 
and that of lBerkI (Il974l ) and iBhansalil (|l978l ) in the case of short memory time series. Subsidiary 
proofs are given in the Appendix. 



1 Model 

Let {Xn)nez be a discrete-time (weakly) stationary process in with mean and a its autocovari- 
ance function. We assume that the process {Xn)n& is a long-memory process i.e.: 

00 

fc= — CXD 

The process {Xn)nez admits an infinite moving average representation as follows: 

00 

Xn = ^j^n-j (1) 

where {£n)n& is a white-noise series consisting of uncorrelated random variables, each with mean 
and variance a1 and {bj)j^n are square-summable. We shall further assume that {Xn)nez admits 
an infinite autoregressive representation: 



j=0 



(2) 



where the (aj)jgN are absolutely summable. We assume also that (aj)jgN and {bj)j^^, occurring 
respectively in ([2]) and ([T]), satisfy the following conditions for all 5 > 0: 



< Cij-''-^+^ 



(3) 
(4) 



where Ci and C2 are constants and d is a parameter verifying d g]0, l/2[. For example, a FARIMA 
process {Xn)n<^z is the stationary solution to the difference equations: 



cj){B)il-B)''Xn = e{B)en 
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where (en)nez is a white noise series, B is the backward shift operator and (p et 9 are polynomials 
with no zeroes on the unit disk. Its coefficients verify equations ([3|) and dH). In particular, if 
(p = 6 = 1 then the process {Xn)n&z is called fractionally integrated noise and denoted F{d). More 
generally, series like: 



+ 00 



+ 00 



where L and L' are slowly varying functions and ther efore verify cond itions ([3]) and ^ . A positive L 
will be called a slowly varying function in the sense of Zygmund ( 19681 ) if, for any (5 > 0, x i— > x~^L{x) 
is decreasing and x i— > x^L{x) is increasing. 

The condition ([H) implies that the autocovariance function a of the process {Xn)n<^'& verifies: 



Since, if 5 < i^: 



V(5 > 0, 3C73 G 



+ 00 

j=0 

+ 00 

j=0 



k(i)l < Cd 



(5) 



j=0 



J 



d-l+S 



(k + j) 



d-l+S 



+ 



Notice that it su ffices to prov e ([5]) for 5 near in order to verify ([5]) for 5 > arbitrarily chosen. 
More accurately, Inoue ( 199?! ) has proved than if: 



-l^ -d-i 



then 



0- 



where L is a slowly varying function and /? is the beta function. The converse is not true, we 
must have mo re assumptio ns about the series in order to get an asymptotic equivalent for 

{<^{j))jeN Csee llnoud tood )). 



2 Wiener-Kolmogorov Prediction Theory 

The aim of this part is to compute the best linear one-step predictor (with minimum mean-square 
distance from the true random variable) knowing all the past {Xk+i-j,j ^ 1}. Our predictor is 
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therefore an infinite linear combination of the infinite past: 

oo 
j=0 

where (A(j))jgN are chosen to ensure that the mean squared prediction error: 

is as smah as possible. Following Whittle ( 19631 ). and in view of the moving average representation 
of 

n£Zj we may rewrite our predictor Xi~(l) as: 

oo 

Xfc(l) = ^0(j>fc_,. 

j=0 

where {4'{j))jeN depends only on (A(j))jgN and (aj)jgN defined in Prom the infinite moving 
average representation of {Xn)nez given below in ([T]), we can rewrite the mean-squared prediction 
error as: 



E[(Xfc(l)-X,,+i)^ 



E 



E 



J2 Hj)^k-j - Hj)^k+i-j 

j=0 

j=o / 

since the random variables {£n)nez are uncorrelated with variance a^'^. The smallest mean-squared 
prediction error is obtained when setting (f>(j) = bj-^-i for j > 0. 

The smallest prediction error of {Xn)nez is within the class of linear predictors. Furthermore, 

if 

+ 00 

j=0 

denotes the characteristic polynomial of the (a(j))jgz and 



■jZ , 



j=0 



that of the (a{j))j^Zi then in view of the identity, A{z) = B{z) ^, \z\ < 1, we may write: 



Xk{l) = — ajXk+i 



-J- 



(6) 
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2.1 Mean Squared Prediction Error when the Predictor is Truncated 



In practice, we only know a finite part of the past, the one which we have observed. So the predictor 
should only depend on the observations. Assume that we only know the set {Xq, . . . , X^} and that 
we replace the unknown values by 0, then we have the following new predictor: 

^ A: 

i=i 

It is equivalent to say that we have truncated the infinite series ^ to k terms. The following 
proposition provides us the asymptotic properties of the mean squared prediction error as a function 
of A;. 

Proposition 2.1.1. Let {Xn)neZ be a linear stationary process defined by ([1]), ^ and possessing 
the features ([3]) and (jH). We can approximate the mean- squared prediction error o/X^(l) by: 

yd > 0, E{[Xk+i - = ct/ + 0(fc-i+^). 

Furthermore, this rate of convergence 0{k^^) is optimal since for fractionally integrated noise, we 
have the following asymptotic equivalent: 



mxk, 



1-^^(1) 



We note that the prediction error is the sum of a^'^, the error of Wiener-Kolmogorov model and 
the error due to the truncation to k terms which is bounded by 0{k~^~^^) for all 6 > 0. 



Proof. 



Xk+i - X'f^{l) 



: Xk + l - Xk{l) + Xk{l) - X'^{1) 

+ 00 +00 

= Xk+1 — E bj+iEk-j — E OjXk+i-j 

j=0 j=k+l 

= Efc+i — E cijXk+i-j. (8) 

j=k+l 

The two parts of the sum ([8]) are orthogonal for the inner product associated with the mean square 
norm. Consequently: 



oo oo 



E{[Xk+i-X',{l)Y)=ae^+ ^ ^a,aia{l-j) 

j=k+l l=k+l 

For the second term of the sum we have: 

+00 +00 + 



rOO +00 



Yj Y1 ajaia{l-j) 

j=k+l l=k+l 



j=k+l l=j+l j=k+l 



< 2 |a,-||a,+i||cT(l)|+ Y 

j=k+l j=k+l 

+ 00 +00 

+2 E E 

j=k+l l=j+2 
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from the triangle inequality, it follows that: 



hoo +00 



j=k+l l=k+l 



+00 



\ 3=k+l 

+00 +00 

j=k+l l=j+2 



j=k+l 



(9) 
(10) 



for all (5 > from inequalities ^ and ([5]). Assume now that 6 < 1/2 — d. For the terms ([9]), 
since j ^ j^^^^^^ij + is a positive and decreasing function on M+, we have the following 

approximations: 



j=k+i 



2C!C3 [ 

Jk 

'2'CfCs 2d 



1 + 2(1-26 



-1+25 



Since the function j (j ^ 1+^)^ also positive and decreasing, we can establish in a similar way 
that: 



+00 



j=k+i 



r+00 , ,0 
clc,^^ [3-'-'^') dj 



1 + 2d - 25 



For the infinite double series (jlOp . we will similarly compare the series with an integral. In the 
next Lemma, we establish the necessary result for this comparison: 

Lemma 2.1.1. Let g the function j-d-i+5 ^-d-i+s _ j|2d-i+(5_ ^.et m and n be two 



positive integers. We assume that 5 < 1 — 2d and m > ^_^2d-i f^"^ ^ ^ 
^n,m the square [n, n + 1] x [m, m + 1]. If n > m + 1 then 



S~d-i 
5+2d-l 



. We will call 



9{l,j)djdl > g{n + l,m). 



Proof, see the appendix 14.11 



□ 



Assume now that 6 < 1 — 2d without loss of generality. Thanks to the previous Lemma and the 
asymptotic equivalents of there exists K £N such that il k > K: 



-00 +00 



Y Y ajaiail-j) 

j=k+l l=k+l 



f+00 

< C / 

Jk+l 



l-d-l+5^^_j^2d-l+5^^ 



dj + 0(k 
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In the integral over / by using the substitution jl' = I, we obtain: 



+2° f+oo r+oo . 

^ ^ ajaiail - j) < C J^'+^M l-''-^+\l-lf''-^+^dldj + o(k- 



2d-l 



j=k+l l=k+l 

Since if 5 < (1 - d)/2 
it follows: 



r+oo 



-d-l+S 



(l-l) 



dl < +00, 



hOO +00 



ajaia{l-j) 

j=k+l l=k+l 



< Ok 



-1+35 



+ k 



-2d-l 



If 5 > 0, 5 < 1 - 2(i and 6 < {1 - d)/2, we have: 

o(k-'+'' 



(11) 



-OO +00 



j=k+l l=k+l 

Notice that if the equality is true under the assumptions 5 > 0, 5 < 1 — 2d and 5 < {1 — d)/2, it is 
also true for any 6 > 0. Therefore we have proven the first part of the theorem. 
We prove now that there exists long-memory processes whose prediction error attains the rate of 
convergence k~^. Assume now that (Xn)nez is fractionally integrated noise F{d), which is the 
stationary solution of the difference equation: 

(12) 



Xr, 



:i - Br^er, 



with B the usual backward shift op erator, (en)nez is a white-noise series and d G ]0, l/2[ (see for 
example Brockwell and DavisI ( 1991 )). We can compute the coefficients and obtain that: 



Vi > 0, a 
then we have: 



and Vj > 0, a{j) 



iyr(l - 2a!) 2 



' T{j + l)T{-d) T{j-d+l)T{l-j-d) 
Vj > 0, aj < and Vj > 0, a{j) > 



and 



J 



-d~l 



and a{j) 



f''-'T{l - 2d) 



when j — > OO. 



' T{-d) T{d)T{l-d) 
In this particular case, we can estimate the prediction error more precisely: 

+00 +0O +00 

fc+1 i+1 

r(l - 2d) 



^^aja,cj(r 



fc+i fc+i 



fc+i 



■-2 



r^-^ii-if-^didj + o{k 



\2d-l 



J) 



r(-d)2r(d)r(i - d) 
r(i - 2d)r(2d) , 



-2d-l 



T{-d)^T{d)T{l + d) 



k+l fc+1 

The asymptotic bound 0(A;^^) is therefore as small as possible. 



(13) 
□ 
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In the specific case of fractionally integrated noise, we may write the prediction error as: 
E([Xfe+i - X[(l)] 2) = a,^ + C{d)k-^ + o (fc-i) 
and we can express C (d) as a function of d: 

_ r(i - 2d)r{2d) 
^(^)-r(-d)2r(d)r(i + d)- 

It is easy to prove that C{d) — > +00 as d — > 1/2 and we may write the following asymptotic 
equivalent as d — > 1/2: 

^^'^^ ^ (i-2d)r(-i/2)2r(i/2)r(3/2)' ^^^^ 

As d ^ 0, C{d) ^ and we have the following equivalent as d ^ 0: 

Cid) ~ d^. 



Figure 2.1: The Constant C{d), d G [0, l/2[, defined in 1^ 




d 



As the figure ETT] suggests and the asymptotic equivalent given in p5]) proves, the mean-squared 
error tends to +00 as d — > 1/2. By contrast, the constant C{d) takes small values for d in a large 
interval of [0, 1/2 [.Although the rate of convergence has a constant order k~^, the forecast error 
is bigger when d 1/2. This result is not surprising since the correlation between the random 
variable, which we want to predict, and the random variables, which we take equal to 0, increases 
when d 1/2. 



8 



2.2 Estimates of Forecast Coefficients and the Associated Mean Square Error 

We will now estimate the mean-squared error between the predictor ^^(1) defined on ([7]) and the 
predictor defined as: 



where aj are estimates of aj computed using a length T realisation of the process. More precisely, 
we consider a parametric approach and we assume that: 

aj = aj{6) with 9 an unknown vector in B 

where is a compact subset of M^. Assume that the process iYn)n£i is Gaussian. Let 6q be the 
true value of the parameter. We assume the realisation iYn)i<n<,T to be known. We estimate the 
{aj)i<j<k by cij := ajiOx) where 9t is an estimate of for exar nple the Whittle estima te. In order 
to use the Whittle estimate and follow the approach suggested in IFox and Taaaul dlQSd ). we assume 
from now on that all the processes in the pa rametric class h a ve a s pectral density denoted hy f{.,6). 
We define the Whittle estimate by (see Fox and Taqqu ( IQSd )): 



9t 



where It is the periodogram: 



argmm 



/t(A) 



1 

2^ 



[f{x,e)]-^iTiX)dx 



(16) 



2nT 



Before we state the theorem, we will give assumptions on the regularity of the spectral densities 
in our parametric class. Under those standard conditions, the estimate d vector converges to the 
true parameter if the process is a Gaussian long-memory time series (see IFox and TaaauTfligsel ) ) . 
We will refer to the following assumptions. 

We say that /(x, 6) satisfies conditions A0-A6 if there exists < a{6) < 1 such that for each 5 > 0, 

AO. /(A,eo) = |A|2"(''o)L(A,eo) with L(.,0o) bounded. L(.,0o) is differentiable at and ^(.,^0) / 0. 
Al. 6 I— > /(0, A)dA < +00 can be twice differentiated under the integral sign. 
A2. f{0,X) is continuous at all (6*, A), A 7^ 0, /~^(6',A) is continuous at all (6*, A) and, 

/(e,A) = 0(|Ar"(^)-^) asA^O. 

A3. {d/d9j)f-\e,X) and {d'^ /dejd9i)f-\e, X) are continuous at all (0,A), 

d 



and 



VI < i < p, 



yi<j,l<p 



de. 



92 



-/-i(0,A) = 0(|A|"(^)-'') asA^O 



dose. 



./-i(0,A) = OdAr^^)^"^) asA^O. 
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A4. {d/dX)f{9,X) is continuous at all (^,A), A 7^ 0, and 

^/(0,A) = O(|Ar"W-i-^) as A^O 
A5. {d'^/d9jd\)f-^{e,\) are continuous at all (6*, A), A / 0, and 

Vl<j<p, ^^r\e,X) = 0{\\\<'^-^-') asA^O. 

A6. {d^/dejd^X)f-^{e,X) are continuous at all (6*, A), A / 0, and 

VI < i < p, ^^r\e, A) = 0(1 Arw-2-^) 

We can now express the asymptotic behavior of the mean-squared prediction error due to the 
estimation of the forecast coefficients. We assume in this Section that the process is Gaussian. Let 
{Xj)j^z be a stochastic process, which verifies the assumptions of section [H and let {Yj)j^z be a 
process which is independent of {Xj)j^z, but has the same stochastic structure. We want to predict 
knowing (^j)jG[i,fc| we assume that the parameter and so the forecast coefficients are 
estimated based on a realisation (X?)je[[i,T]- 

Theorem 2.2.1. Let [Xn)n&l, be a stationary Gaussian long-memory sequence with mean and 
spectral density f{9,X) and 6 ^ Q is an unknown parameter. The set Q is assumed to he compact. 
We assume also that 6q is in the interior of@ and thatMO € ©, the conditions A1-A6 hold. Moreover 
we assume that each process {Zn)n& in our parametric class with 9 £ Q admits an autoregressive 
representation: 

00 

£n = ^ aj{6)Zn~j 
3=0 

where {en)n& is a Gaussian white noise. Let 6q he the true value of the parameter and assume that 
9q G G). Assume also that f{6Q,X) verifies AO and that for any j G N, aj verifies: 

(i) Oj is uniformly hounded on a neighbourhood of 60; 

(a) the first and second derivatives of Uj are continuous and bounded on a neighbourhood of 9q . 
and that: 

' < Cij-^^'. (17) 



doj 



V5 > 0,3Ci,Vi G N* 
We have then the following result: 

2 „ 



e(x^,(i)-x;(i))' = o(^) 



An example to which our theorem applies is the fractionally integrated processes. In this case, 
the parameter 6 is scalar and corresponds to the long-memory parameter d. Assumptions A0-A6 
hold for fractionally processes. We define do by := ^-nd then we have a,-: 



aj{d) := 



r(i - d) 



T{j + m-d) 
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Since the gamma function T is analytic on {C \ N}, there exists a neighbourhood of on which 
the function aj and its first and second derivatives are bounded. Finally when j +00, 



'dd 



{do 



cj- 



where C is a constant. As a consequence our Theroem can be applied on the class of fractionally 
integrated noise because all the assumptions hold. Similarly we can also show that the class of 
FARIMA time series verify the assumptions of this Theorem. 

Proof. We first define the following vector: 

al := (^ai(^r) - ai(6'o), • • ■,ak{6T) - ak{Oo) 
where v* is the transpose vector of v and 

Xn := {Xk, . . . ,Xi) . 



E x'ii)-x'ir 



with 



E 



E 



«i(6't) - ai, . . . .akiOx) - Ofc 




trace [E (al'K^ 

E (^trace (afca^X^ (^X^)* 
trace (E (0^0^) S^) 



Sfc :=E(X^ (X^ 



Let us first study the covariance matrix of the estimated coefficients E(ajta^). We can write 
(afcafc)jj = E {gi,j{9)^ when gtj is defined by gtj : 9 ^ {ai{9) - ai{0o)){aj{ 6) - aj { 9o)). We then 
use an order 2 Taylor series expansion of gij and apply Theorem 5.4.3 form Fuller ( 19761 ) .We will 
refer to the following version. 
If the following assumptions hold 



(i) Vm G [l,p], E [\9T,m - ^o,mPj = 0(ry(r)) where ^r.m is the m*'^ entry of 9t; 

(ii) Bt 9o, P-a.s.; 

(iii) gij is uniformly bounded on a neighbourhood of 9q; 

(iv) the first and the second derivatives gij are continuous and bounded on a neighbourhood of ^0 



11 



then 



P o 

e(5m(^t)) = 9i,A0^) + Y.^{eT,i-eo,i)-^{eo) 

1=1 ' 



p p 



By assumption, conditions (ii) et (iv) hold. We note also that: 

gi,ji9o)=0 et V/ G ^ (0o) = 

Next we compute the fourth order moments of 9t — Oq in order to estimate the second and the 
third moments. We define: 



1 

Y'AT{e)Y 



[fi\,e)]-'iT{x)dx 



where (Y)* = (Yi,...,Yt) and 



''''' (2vr)2y_, 

We follows now the proof of Fox and Taqqu ( IQSd ). Since 9t = argmin {(7t(6')} and according 

e 

to the mean-value theorem, we have: 



3^* such that 



<\e- eo\ and 



ddiOj 



-I -1 



l<i j'<p 



It is justified because 9 i— > [/(A, 0)] ^ is twice differentiable with respect to 9 and all the par- 
tial derivatives are in tegrable on [— vr, vr] with respect to A by assumption A3. It follows from 
Fox and Taoaul (|l986l ^ that: 



d9i9j 



where W := (^ij)i<jj<p is a positive definite matrix. Since the matrix norm x i— > ||x||4 is contin- 
uous, there exists C > such that ||H^||4 > C and: 



3M G N, T> M , 



< C 



-a.s. 



(19) 



Using this inequality, we can now estimate the fourth moments for any m G [[l,p]]: 



E 



d 
d9„ 



E 



T 
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Let m S We define the matrix with (j, /)-th entries: 

Next we rewrite this expression as: 
d 



E 



-aT[Oo, 



r"^E 




The p rocess is Gaussian then all the moments are a function of the autocovariances (see Triantafvllopoulo^ 

In equation (j20p . we can rewrite each fourth moment in the sum as a linear combi- 
nation of product of 4 covariances. We then count how many covariances belongs to the set 
5 = {E (y,,y,,) , E {Y,,Y,,) , E {Y,,Y,,) , E {Y,,Y,,)}: 

1. either we have ¥.(Yj-^Yj^) x C and we can distinguish the following possibilities: 

• C = E{Yj.^YjjE (Yj^Yj^)E (Yj^Yj^) only one possibility or; 

• C has one element in S and no other which makes 6 possibilities =(3 choices in £')x(2 
choices for the other covariances) or; 

• C has no elements in S which makes 8 possibilities. First choose a complement for Yj^ 
(4 possibilities) then a complement for Yj^ (only 2 possibilities because the pairs in S are 
excluded); 

2. or Yj-^ is with Yj^, I > 2, which makes 5 possibilities. Let us assume that Yj-^ is associated with 
Yj^. We can then distinguish the following cases: 

• we obtain 2 pairs in S which are consequently {Yj^,Yjf^) and {Yj^,Yjg) or; 

• we have only one couple in S which makes (2 choices in S) x((C| — 1) choices for the 
other covariances) or; 

• we have non elements in S: either Yj^ is the complement of Yj^ and then we have only 2 
possibilities, or we have 4 choices for the complement of Yj^ and only 2 for Yj^. Finally 
we have 10 possibilities. 
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Therefore we obtain: 



E 



d 



da 



■aT{Oo 



m 
T T 



\j=i 1=1 

(T T \ " ^ 

^^^j,i'^i3 -^)\ ,32 ,ji "^ih- h ) (j2 - j4 ) 

j=i (=1 / iij2,i3,i4=i 

(T T \ 
YY^j,l'^U ,j2 hi J4 ^js J6 ^ (il - h ) (j4 -j5)(J ih - J2 ) 

J = l '=1 / il,i2,j3J4,i5 J6 = l 

(T T \ ^ ^ 

YY^i'^^^^ Yl J2 -^ia ,H 0" (il - is ) (j2 - j4 ) 

i = l '=1 / ilj2j3j4=l 

(T T \ ^ 

Y Y ~ ^) I hd2h,jJj5,h'^ih - h)(^{ji - k)<y{k - h) 

i = l '=1 / jl,j2,j3,j4„j5,j(i = l 

T 

+ 5T-H0 ^ji,j2^j3,jJj5d6^j7,je(^Ul - h)cr{j4 - J5)(^(j6 - h)(T{js - 02) 

ji ,32 ,3S ,34, ,3b ,36 ,37 j's = 1 

All the terms of this sum are like: 

T 



Y ■ ■ ■ ^32p-i,32p'^Ul - is) • • • 0-(i2p - i2) := Sp^T- 

jl,.:,j2p = l 



(21) 



Note that Sp^T = trace {(Y^T^m)^) and that is the covariance matrix defined by the spectral 
density: 

9 



-/-I (A, Oo) = O (^A"(^o)-'') as A ^ 0, for any 5 > 



by assumption A3. By applying the Theorem 1 of I Fox and Tag qui (|l987l ). we prove that: 



1 ^ 

T Y h,32---h2p-i,32p'^ih-j3)---(r{32p-jl) 



(2vr) 



iivj2p=i 

2p-l i ( f-1 



f-'{X,9o)f{X,eo)] dX 



0(1). 



(22) 



It follows from assumptions A2 and A3 that this integral is always finite. We need a more precise 
result for the term: ^ ^ 

j=i 1=1 
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De spite this can al s o be e xpressed like (f2T]) . the estimate given below is not sufficient to conclude. 
By Fox and Taqqu ( 19861 ) [proof of Theorem 2], we have: 



T T 



(23) 



By (|22|) et ([25]) . we may conclude that 

V(5 > 0, E 



d 



■(Jt{6q) 



o(r 



(24) 



Next using the asymptotic estimate of the fourth moments, we can now obtain asymptotic properties 
for the second moments: 

E 



First we have to prove the uniform integrability of \/T (^tj — ^oj ) (^T,« — ^o.i): 



T^E { (0T,i - ^0,,)' (ot,i - ^0,0 '1 < T\U ((eT,j - e^X] E ( - 





-eo 




d 


( 





if T > M from (|19p . By applying result ()24p . we conclude that: 

= 0(1) 



We have proved the uniform integrability of VT yOT,j — Sajj VT {^Ox.i — Oq iJ since if E [X^^ is finite 
for an y T,then the collection (Xt) is uniformly integrable. Moreover according to iFox and Taqqu 
(Il98fil ) [Theorem 2]: 

Vt {Ot - ^o) ^ (O, AttW-^) 
where W is the matrix defined in (llSp and we have also the following convergence in law: 



hj^i (VrCeT-eo)) ■■=t(9t„ 



with Z N {GAj ^ W'^) . By the convergence in law and the uniform integrability we apply Theorem 
5.4 in lBilhngslevI (|l968l ) that: 



E r ^T. 



47114^-^ as T ^ +00. 
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Now we give an asymptotic bound for the third order moment by applying the Cauchy-Schwarz 
inequahty. Using the inequaUties (|19p and (j24p . we conclude that: 



3C7 > 0, E 



7T,j - f^oj ^ 

< VCT-iT-3+<5, y5 > 
= O (t-^+^^ , V(5 > 

We obtain the following Taylor series for any 6 > 0: 

m m o2 
1=1 n=l ' " 



-2+5 



daiiOo) daj{9o) daj{9o) dai{9o) 



801 d6n ^ d6i 86, 



wr 



1=1 n=l 

We can now conclude and find an asymptotic equivalent of E (0,^0^). Since is symmetric: 

(25) 

with 



an. • • • ^la 



D :-- 



dak {dp) 



dak {do ) 



W ^ is a positive definite matrix because W is too. So it can be expressed as : 

/ Ai ... \ 

A2 ■•• ; 



W' = P* 



P 



V ... \m j 



where P = {pij)i<ij<m is an orthogonal matrix and the {Xi)i<i<m are the positive eigenvalues of 
W~^. We may rewrite our expression as: 



DW-'D 



1 7~i* 



E 

r=l 

m 

r=l 

where (3r is the vector (^V\-Ylb=i 



n 



1=1 



1=1 



l<i,j<k 



(26) 



l<i<k 



E 



ai{eT) - ai{9o), . . . , akiOr) - afc(6'o) 



/ ^0 \ 



V X-k+1 J 



trace (E (a^a^) S^) 

m 

AttT-^ trace (/?;/?rSA 



r=l 
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from (j25p and (j26|) . Therefore we obtain: 

m 

trace (E (afca^) Sfc) ~ 47rr"^ ^ Pr^kPr 



r=l 



fcllPrlli 



r=l 



where Aj. is the greatest eigenvalue of E^. The last inequality is a consequence of being symmetric 
matrix. Following assumption (jl7p . we have: 



V5>0,3Q,ViG 



9^; 



< Cij 



-1+5 



and we can hence estimate ||/3r-||2- Let 6 = 1/2, there exists Ci, . . . , Cm such that: 

fc mm 



3r lli 



j = l h=ll2 = l '2 



-3 



ll = l «2 = 1 

m m 



+ 00 



,■-3 



Cr(^o) 



li=l «2=1 i=i 

where Cr (6n) does not depend on k. 

From Boettcher and Virtanen ( 20061 ). the spectral norm of a Toeplitz matrix (its spectral norm), 
whose symbol has the form A i— > A~°L(A) with L is a bounded, continuous at function and does 
not vanish at 0, is equivalent to Ck" with C constant. We conclude the proof: 

2 

'ai{0T) - ai{9o),. . . ,ak{OT) - akiOo)' 



( 



E 



V 



< C47r^a(^o) 



r=l 



T 



with C constant. 



□ 



2.3 Conclusion 



Prediction with the Wiener-Kolmogorov predictor involves two mean-squared error components: 
the first is due to the truncation to k terms and this is bounded by 0(/c~^), the second is due to 
the estimation of the coefficients Oj from a realisation of the process of length T and is bounded by 
0(1?'^ I T\ The mean-squared difference between the best linear predictor Xfc(l) and our predictor 
is given by: 



^T,fc(l)-^fc(l) 



< 



+ 



If we want to compare the two types of prediction errors, we need a relation between the rate of 
convergence of T and k to -|-cxd. For example, if T = o(A;^'^+^), the error due to the estimation of 
the coefficients is predominant and gives the bound for the general error. 
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Truncating to k terms the series which defines the Wiener-Kolmogorov predictor amounts to using 
an AR(A;) model for predicting. Therefore in the following section we look for the AR(A;) which 
minimizes the forecast error. 



3 The Autoregressive Models Fitting Approach 

In this sectio n we shall develop a generalisation of the "autoregressive model fitting" approach 
developed by iRay in the case of fractionally integrated noise F{d) (defined in ()12p ). We 



study asymptotic properties of the forecast mean-squared error when we fit a misspecified AR(fc) 
model to the long-memory time series {Xn)n&- 

3.1 Rationale 

Let ^> a k^^ de gree polynomial defined by: 

$(z) = 1 - ai^kz - ... - ttk^kz''. 
We assume that $ has no zeroes on the unit disk. We define the process {r]n)nez by: 

Vn eZ,rjn = ^{B)Xn 

where B is the backward shift operator. Note that {r}n)n&'L is not a white noise series because 
{Xn)n& is a long-memory process and hence does not belong to the class of autoregressive processes. 
Since <I> has no root on the unit disk, {Xn)n& admits a moving-average representation as the fitted 
AR(/c) model in terms of {rjn)n&i'- 

oo 

Xn = ^C{j)rin-j. 

j=0 

If {Xn)ni^z was an AR(/c) associated with the polynomial <I>, the best next step linear predictor 
would be: 

oo 

Xni^) = ^c(i)r/t+i_i 

= ai^kXn + . . . + ak,kXn+i-k sin^ k. 

Here is a long-memory process which verifies the assumptions of Section [TJ Our goal is to 

express the polynomial <I> which minimizes the forecast error and to estimate this error. 



3.2 Mean-Squared Error 

There exists two approaches in order to define the coefficients of the /c*^ degree polynomial <&: the 
spectral approach and the time approach. 

In the time approach, we choose to define the predictor as the projection mapping on to the 
closed span of the subset . . . , of the Hilbert space L^(0,.F, P) with inner product 

< X,Y >= E(Xy). Consequently the coefficients of ^ verify the equations, which are called the 
^th Qj.(jgj- Yule- Walker equations: 

k 

yj^U,kl J^a,,fca(i-j) =a(j) (27) 
1=1 
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with 



The mean-squared prediction error is: 

E[(X„(1) - Xn+iY] = c(0)2E(r?2^i) = E(r?2^i). 
We may write the moving average representation of (?7„)„gN in terms of (en)ngN: 

oo min{j,p) 

j=0 k=0 

oo 

j=0 

min{j,p) 

VjGN, t{j)= ^kb{j-k). 

k=0 



Finally we obtain: 



\2_2 



j=0 

In the spectral approach, minimizing the prediction error is equivalent to minimizing a contrast 
between two spectral densities: 

/(A) 



dA 



where / is the spectral density of X„ a nd qf., ^) is th e spectral density of the AR(p) process defined 
by the polynomial $ (see for example Yaiima ( 19931 )). so: 



" /(A) 
^^5(A,^) 



dA 



i=o 

oo 



$(e 



I J]t(j>-'J-^|2dA 



'dA 



In both approaches we nedd to minimize YlTLo^U)- 



3.3 Rate of Convergence of the Error by AR(A;) Model Fitting 

In the next theorem we derive an asymptotic expression for the prediction error by fitting autore- 
gressive models to the series: 

Theorem 3.3.1. A SSUTflG. that {X^t^'fl is a long-memory process which verifies the assumptions of 
SectionUl IfQ<d<\: 



E[(Xfc(l)-Xfc 



+1^ 



al = Oik-'] 
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Proof. Since fitting an AR(fc) model minimizes the forecast error using k observations, the error 
by using truncation way is bigger. So, since the truncation method involves an error bounded by 
, we obtain: 

E[{Xk{l) - Xk+iY] -al = 0{k-^). 

Consequently we only need to prove that this rate of convergence is attained . This is the case 
for the fractionally integrated processes defined in (112p . We want the error made when fitting an 
AR(A;) model in terms of the Wiener-Kolmogorov truncation error. Note first that the variance of 
the white noise series is equal to: 

2 

,2 



0". 



/(A) 



j=0 



dA. 



Therefore in the case of a fractionally integrated process F((i) we need only show that: 

2 

^2 fir ftW 



/(A) 



2vr g{X,^k 



-dA ~ C{k- 



/(A) 



+ 00 

E 

j=0 



ijX 



2vr g{X, $a 



-dA 





+ 00 


2 


k 
















j=0 




i=o 





dA 



we set Qj^k = if j > fc. 



jr=0 1=0 



j=o 1=0 



(28) 



= y^ y^(ajai - aj^kai)(7{j - + y^(oi,fca/ - a.j^kai,k)<^{j - 
i=o «=o j=o /=o 

+00 +00 k +00 

= y^("i ~ "'^(^ ~ + X]*^'^' ~ ai^k)cr{j - I) 

j=0 1=0 j=0 1=0 

We first study the first term of the sum (p9|) . For any j > , we have aia{l — j) = 0: 



(29) 



y^ aiXn-i 

3=0 



E y] 



-l^n 



J=0 



y^ aiXn-iX. 

1=0 
oo 

y] a,o-(/ - j) 

oo 

y]a«cT(/ - j) 
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and we conclude that Ylt^ o.ia{l — j) = because (en)nez is an uncorrelated white noise. We can 
thus rewrite the first term of (1291) hke: 



-oo 



'^{aj - aj^k)'^aia{l - j) = (ao - ao,fc) ^ a/(T(/) 

j=0 1=0 1=0 

= 

since oq = ao,fc = 1 according to definition. Next we study the second term of the sum (i29]l : 

k +00 
j=0 1=0 

And we obtain that: 

k +00 k k 

aj,k '^iai - ai^k)cr{j - I) = '^{aj,k - aj) ^(a/ - ai^k)crU - 
j=0 1=0 j=l 1=1 

k +00 

+ ^{aj,k-aj) ^ aia{j-l) (30) 

j=l l=k+l 
k k 

+ ^aj^{ai- ai^k)<y{j -I) (31) 

3=0 1=1 

k +00 

+ ^0.3 ^ aia{j-l) 

j=0 l=k+l 



Similarly we rewrite the term (|30p using the Yule- Walker equations: 

k +00 k k 

^(oj- fc - aj) ^ aia{j - I) = - ^(oj, fc - aj) ^ aia{j - I) 

3=1 l=k+l 3=1 1=0 



We then remark that this is equal to (j3ip . Hence it follows that: 

k +00 k k 

^<^3,k^{ai-ai,k)(y{3 -I) = ^{aj,k - aj)'^{ai - ai^k)(^{j - I) 
j=o 1=0 3=1 1=1 

k +00 

j=l l=k+l 
k +00 

+ Yl (32) 

j=0 l=k+l 



On a similar way we can rewrite the third term of the sum (j32p using Fubini Theorem: 

k +00 +00 +00 

^a^- ^ o,cj(j - Z) = - ^ ajaiaij - I). 

j=0 l=k+l j=k+ll=k+l 
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This third term is therefore equal to the forecast error in the method of prediction by truncation. 

In order to compare the prediction error by truncating the Wiener-Kolmogorov predictor and 
by fitting an autoregressive model to a fractionally integrated process F(d), we need the sign of all 
the components of the sum (j32p . For a fractionally integrated noise, we know the explicit formula 
for Uj and (7{j): 

In order to get the sign of aj^^ — aj we use the explicit formule given in Brockwell and David ( 19881 ) 
and we easily obtain that aj^^ — aj is negative for all j G P, A;]. 

Tjj-d) r{k + l)T{j -d)T{k-d-j + l) 

"^'^ r(j + i)r(-d) Tik - j + i)r(i + i)r(-d)r(A; -d+i) 

T{k + l)T{k-d-j + 1) 



-1 + 



T{k-j + l)T{k-d+l 
k...{k-j + l) 



{k-d)...{k-d- j + l) 



1 



> 



since Vj € N* aj < 0. To give an asymptotic equivalent for the prediction error, we use the sum 
given in (j32p . We have the sign of the three terms: the first is negative, the second is positive 
and the last is negative. Moreover the third is equal to the forecast error by truncation and we 
have proved that this asymptotic equivalent has order 0(A;~^). The prediction error by fitting an 
autoregressive model converges faster to than the error by truncation only if the second term is 
equivalent to Ck~^, with C constant. Consequently, we search for a bound for aj — aj^k given the 
explicit formula for these coefficients (see for example Brockwell and David ( 1985 



r{j-d) r{k + i)r{j -d)r{k-d-j + i) 
r(j + i)rM) T{k-j + i)T{j + i)T{-d)T{k-d + i) 

V{k + l)T{k-d- i + 1) 



-1 + 



T{k-j + l)T{k-d+l 
k...{k-j + l) 



{k-d)...{k-d-j + l) 

n i-^) - 



1 




\m=0 

Then we use the following inequality: 

Vx G M, 1 + X < exp(x) 
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which gives us: 

/ /i-i 



aj - aj^k < -aj exp J^+i " 

V \m=0 ^ k / 

< -a.fexpfdX;^3737 



m=0 

i-1 



I 1 
< -a.exp d^— — 



k-d-l 

m=0 / 



According to the previous inequaUty, we have: 



k +00 fc— 1 +00 

j=l l=k+l j=l l=k+l 

+00 

+{ak - ak,k) X -aia{k - I) 

l=k+l 

k-1 / j-l ^ \ +00 



< X-a,exp(dX ^_^_^ X-a,a(i-0 

J=l \ m=0 / «=fe+l 

(fc-l ^ \ +00 

m=0 / Z=fe+1 



j=i ^ l=k+i 

+00 

+(-aik)fci'^ -aia{k-l) 



l=k+l 

As the function x ^ k-d-x increasing, we use the Integral Test Theorem. The inequahty on the 
second term follows from: 



^ k — d — m 

m=0 
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for k large enough. Therefore there exists K such that for ah k > K: 

k +00 fc— 1 / / k d W 

^(aj-ttj^k) Yl < X]~"j^''p(^^"( fc_tf-7 )) ^ -aic7{j-l) 

i=l l=k+l j=l ^ ^ ^ ^ l=k+l 

+00 

+(-afc)A;i^ -aia{0) 

l=k+l 

fc— 1 +00 

j=i i=k+i 



+Ck~'^-^kl'^k''^ 



C 



pi r+00 

/ r'-'(i-i)-"/ 

Jl/(k-d) Jl 



{k - df 



+Ck-^2'^-^ 



n/{k-d) 



< C\k-d)-'^+'^ + Ck-^2 



and so the positive term has a smaller asymptotic order than the forecast error made by truncating. 
Therefore we have proved that in the particular case of F(d) processes, the two prediction errors 
are equivalent to Ck^^ with C constant. □ 

The two approaches to next-step prediction, by truncation to k terms or by fitting an autoregres- 
sive model AR(A;) have consequently a prediction error with the same rate of convergence k^^. So 
it is interesting to study how the second approach improves the prediction The following quotient: 

, Ej=i(ai,fc - Oj) ELi(a/ - a«,fc)o-(j "0 + 2 Ei=iKfc " ^j) E/t=T+i " 
r(k) := — ; ; (33) 

22j=oaj}2i=k+iai(T{j - I) 

is the ratio of the difference between the two prediction errors and the prediction error by truncatingn 
in the particular case of a fractionally integrated noise F(d). The figure [3TT] shows that the prediction 
by truncation incurs a larger performance loss when d 1/2. The improvement reaches 50 per 
cent when d > 0.3 and k > 20. 



3.4 Error due to Estimation of the Forecast Coefficients 

Let {Xj)jfzz be a stochastic process, which verifies the assumptions of section [H and let {Yj)j^z be 
a process which is independent of {Xj)j^z, but which has the same stochastic structure. We want 
to predict X^^i knowing {Xj)j^^i^j^^ and we assume that forecast coefficients are estimated based 
on a realisation (^■)jG[i,T]- 

We estimate the forecast coefficients using the Yule- Walker equations (j27p where we replace the 
true covariances by the empirical covariances computed from the realisation (^■)jg[i,T| ^ 

T-k 

^) = -Y,ytYt+k (34) 
t=l 

There exists a recursive scheme for computing the forecast coefficiei its. It is known as th e Durb in- 
Levinson or innovation algorithm and it is described for example in Brockwell and David (1991). 
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Figure 3.1: Ratio r{k), d g]0, 1/2[ defined in ([33ll 
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Let {Yj)j<^z be a zero-mean process with autocovariance function a. The coefficients {ai,k)i£ii,k} 
satisfy the Yule-Walker k^^ equations: 

k 

Vj G ll,kj, a{j) =^(t{u- j)au,k- 

u=l 

If we let v{0) = ct(0) and ai,i = a{l)/a{0), then we have for any integer n: 



n— 1 ^ 

- y] Oj- „_icj(n - j)] -7 T. 

^-^ n — 1 







ai,n-i 


\ 




^ «n-l,n-l \ 




















) 




\ ai,„_i / 



v{n) = vin - - al^ J. 

We denote by (oi^, . . . , a^) the respective solutions to the Yule- Walker equations obtained by 
replacing the covariances by theirs estimates the empirical covariances defined in (I34p . Contrary to 
Section [2.2t the estimation of the forecast coefficients is non-paramet ric. 

Another way to estimate the coefficients has been considered by lYajima Our method 
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borrow the idea (see section 13. 2p that the coefficients of the AR(/c) minimize: 



/(A)/5(A,<I>)dA. 



(35) 



If we replace in (|35|) the spectral density by the periodogram '■ 

/t(A) = - 

then: 



2ttT 



(oT^, . . . , 5^) = argmin / /r(A)/5(A, $)dA. 



(36) 



From now on we incorporate the effects of estimation of the AR(A;) coefficients using a realisation 
of length T, as T — > +00 and study the mean-squared prediction error due to this estimation. We 
define Xx^ki^) the predictor with all the coefficients aj^k replaced by their estimates: 

k 

^T,ki^) '■= aj,kXk+l-j 

More precisely, we study the mean-squared difference between the predictor with the estimated 
coefficients ajj^ and the predictor with the true coefficients aj^k- 



E 



E 



XT,fe(i)-Xfc(i; 



[ai k — ai k, • • • , ak,k — ak,k 





( 


f 




— ai^k 


trace 


E 










[ 






— ak,k 










— ai^k 


trace 




[ 








[ 




\ ak,k 


— ak,k 




(oi^ — oi.fc, • • • , Sfc^ — ak,k) 



First we estimate the covariance matrix: 



E 



( oi.fc - ai,A 



[ai^k — oi.fc, • • • ) o-k^k — ak,k) 



\ \ ak,k — ak,k 
For later convenience, we now introduce the vector: 
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and the {k x k) matrix: 

We now state the theorem which ahows us to conclude. 

Theorem 3.4.1. We assume that the process {Yn)nez is Gaussian, that its autocovariance function 
a verifies: 

cr{j) ~ ^f'^'^ with A > 0, 
that the coefficients of its infinite moving average representation bj verify: 

bj ~ with 6 > 0, 

and finally that the white noise process {£n)nez is such that Vn G Z, E(e^) < +oo. We will denote 
by gij the function: 



9i,j 



where 




(xo, ...,Xk) 



( Xx 

X\ Xo 

V Xk Xk-1 



iUi - ai,k){yj - aj^k) 



Xk 
Xk-1 



xq / 





(xi \ 




\ Xk ) 



Then 



E(5,,,,(a(0),cT(l 

1 - Er=l Or, 



,a{k))) 

2 



k] Cn'^" ^ {^i^'^lk,k^k 

)ln(n) 



-1^ 



where C and D are constants independent of n and k. The definition of the matrix H follows. We 
define h as h{X) = |1 — ^^^=1 o-r,k^^^^\'^ 0''nd we denote by h^"^^ the derivative of the function h with 
respect to a^^k- The {i,j)-th entry of the matrix H is given by: 



^«(A)/i(-')(A)/2(A)dA. 



Proof. The proof is given in Appendix. 

Next we estimate the asymptotic behaviour of E 



□ 



and we state the follow- 



ing Theorem which gives an estimation of the mean-squared error when d> 1/4: 



Theorem 3.4.2. We assume that the assumptions of the Theorem \3.4-l\ hold. We assume also that 
the spectral density of the process is such that: 



Vx e [-vr, tt], /(x) = fd{x)L{x) 
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with fd defined by: 

Mx G [-7r,7r], fd{x) = 2' 

and L a positive, integrable on [— 7r,7r], continuous at and hounded below by a positive constant. 
Ifd = l/A then 



E 



o 



log(r)^ 



and if d £ ]l/4, l/2[, we thus get 



E 



XT,k{i)-Xkir 



o 



T 



J^2-4d I ■ 



Remark The assumption that L and so / are bounded below by a positive constant is not 
a new very restrictive assumption. Since we have assumed that the process admits an infinite 
autoregressive representation: 

cx> 

E'h — ^^OjXfi—j, 
j=0 

where the coefficients aj are absolutely summable, we have that the spectral density can be written 
as: 

/(A) 



1 



and consequently the spectral density can not vanish on [— 7r,7r[. 
Proof. Applying the last theorem, we obtain that if d = 1/4 then 

k 



E 



(^(l)-Xfe(l) 
and if dG]l/4,l/2[ then 

(X^k{l) - Xk{l) 



O I trace | { j ^^^1^,^ 

yj=0 



E 



First we estimate 



O I trace 



rp2 



1 / 



i,k I '^k '^k,k 



J=0 



\-^k 



. We write this like: 



k 
j=0 



\ 3=1 j=0 

We follow the proof of Theorem 3.3 of Inoue and Kasahara ( 20061 ) about the convergence of the 
sequence of the misspecified AR(/c) model coefficients to the R(oo) representation coefficients. We 
shall remark that there exists Ci, C2 and K such that if: 

+00 +00 
k>K, k{aj^k- CLj) <Ci ^ la^il + C2 ^ |a«| 



u=k—j 



U=] 
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and C is a generic constant: 



+00 



+00 



si k>K, k (aj^k - aj) < C { ^ \au\ + \au\ 

.u=k-j u=j 



We thus get that i{k> K: 



C 



i=i 

k 



+00 



yU=k—j 



So we may conclude that: 



i=o 



j=0 



0(1). 



Next we have to study the asymptotic properties of: 



trace = (1 . . . 1) S^^ 



/ 1 



V 1 



(37) 



(38) 



Then applying Theorem 6.1 of Adenstedt ( 19741 ) under the assumptions of theorem 13.4.21 
obtain the following asymptotic equivalent: 



we 



(1...1) 



/ 1 



V 1 



A;i-^'ir(-2(i+ l)L(O) 
(3{-d+l,-d+l) 



-1 



where F and (3 are respectively the gamma function and the beta function. The result follows. 



The last case is when < d < 1 /4: 



□ 



Theorem 3.4.3. We assume that the assumptions of \3.4-l\ hold. We assume also that the spectral 
density f is bounded above by a constant positive. //0<d<l/4 then 



E 



o 



k 
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Proof. We call (^'j := $j i + . . . + "'^)jgN* the orthonormal polynomials associated with the 

spectral density /, that is to say is a (i — 1)*'^ degree polynomial such that 



Vj, / G N*, /(A)$,- (e^^) (e-^^) dA = Sj^i 



where 5 is the Kronecker delta. We then define the matrix by: 

/ ^1,1 ... \ 




^'2,1 ^2,2 



Tfc verifies the following conditions: 
and so 

Using ([39]) . we obtain that: 



\ ^yt,2 • • • 



k,k j 



E 



itrace (S"^/?) 
1 



T 

with H defined in Theorem 13.4.11 We define ■ A ^ X]j=o ^j,k^ 



trace (TkHT^) 



E 



can therefore be rewritten like: 



-trace ( ( / /2(A)Re [Gk{X)^j{e'^^)) Re [Gk{-X)M(^- 



dA 



^trace I l^j^ /'(A)Re (Gfc(A)cI>,(e*^'^)Gfc(-A)$Ke"*'^)) dA 



1 



-trace / /2(A)Im Gfe(A)cI>,(e^^^^) Im Gfe(-A)cl>,(e-*'^) dA 



because Re(a6) = Re(a)Re(6) — Im(a)Im(6). For later convenience, we note: 
A := [ I /2(A)Re(Gfe(A)$,(e^^))Re(Gfc(-A)$;(e"'^))dA 



B 



C :-- 



fiX)Re (Gk{X)<^>j{e'^)Gk{-X)'^iie~'^)] dA 



/2(A)|Gfc(A)|2$,(e*'^)cI>,(e-'^)dA 
/2(A)Im (Gfc(A)cI>,(e*^) ) Im ( Gfc(-A)$Ke-*'') ) dA 



(39) 



(40) 
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Then we have A = B + C . We wiU prove that A, B and —C are symmetric and positive matrices, 
which imphes that < trace(^) < trace(i?). First we study the symmetry: A is symmetric because 
the real part of a complex is equal to that of its conjugate, B is symmetric because A i-^ (A) | (A) | ^ 
is a symmetric function and C is symmetric because the imaginary part is equal to the negative of 
the imaginary part of its conjugate. Next we study the positivity. Let g := (gi, . . . , q^) be a vector. 
We have: 

qAq* = /2(A)Re [y1^ Gfc(A)(?,$,(e^^) j Re Gk{-\)qi^i{^~''')^ dA > 

k k 

qBq* = r/2(A)|Gfc(A)|2^(?,cI>^(e^^)^giCl,,(e-^)dA>0 
•'-^ j=i 1=1 

qCq* = f{\)lm Gfc(A)g,$,(e^^) j Im Gk{-\)qi^i{e-'^)^ dA < 0. 

The traces of these matrices A , B ei —C are equal to the sum of theirs eigenvalues since they 
are symmetric and thus diagonalizable. Because these matrices are positive, all their eigenvalues 
are positive and the traces are also positive. Therefore we obtain that: 

< trace(A) < trace(5). (41) 

To find a bound for trace (A), it is sufficient to find a bound for trace (S): 

k 

trace(i?) = ^ T /2(A)|Gfc(A)|2$,(e^^)$,(e-*^)dA 
= r f{X)\Gk{X)\'Kk{e'\e'^)dX 

J — n 

where Ki^ is the reproducing kernel defined by: 

k 

Vx,y e C,Kk{x,y) = ^ (y). 

i=i 

We have assumed that the spec tral density / is bounded from below by a positive constant c, so we 
can apply the Theorem 2.2.4 of [Simon and we get: 

VA G [-TT,Tr],Kk{e'^,e'^) < k—. 

We look for a bound for |Gfc(A)p: 

VAe [-7r,^],|Gfe(A)|2 < ij2Kk\ 

\j=o 
= 0(1) 
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as we have proven in (j38p . This bound is independent of A. We finahy notice that if < d < | then 
/ is square integrable. So we obtain that: 

trace(5) = 0{k) 

and we conclude using ([40l) and (jlTj) that: 



E 



XT,k{l) - Xkil] 



0(| 



□ 



3.5 Conclusion 

Fitting an AR(/!;) model also involves two mean-squared error components: the first is due to fitting 
a misspecified model and is bounded by 0(A;^^) and the second is due to the estimation of the 
Yule- Walker co efficients aj h fro m a independent realisation of length T and is bounded by 0{k/T) 
if < d < 1/4 ( Bhansalil (l978l ) has the same asymptotic equivalent for short memory processes), 
bounded by 0{k^''^\og{T) /T) if d = 1/4 and bounded by 0(^1-2^/^2-4^) if 1/4 < ^ < 1/2. As 
in Section 12.31 if we want to compare the two types of forecast error, we need to state a relation 
between k and T and moreover distinguish 3 cases for the value of d. 



In both methods by truncating to k terms the Wiener-Kolmogorov predictor or by fitting an 
ARik) model, the mean-squared error of prediction due to the method is bounded by 0{k^^). 
Nevertheless, the factor of k~^ in this equivalent depends on d. We have shown that the factor 
tends to infinity when d tends to 1/2 in the method by truncation in the special case of fractionally 
integrated noise (Section 12. ip so that the error increases for d near 1/2. Moreover for this value of 
d, figure (13. ip show that fitting an AR(/c) model greatly reduces the error. For the errors due to 
the estimation of the forecast coefficients, the method by truncation is optimal since if we assume 
that T/k tends to infinity (necessary condition to have some mean-squared error which converges 
to 0), then for ah d in ]0, l/2[\{l/4}: 



E 



l)-X'(l) =0 E 



{X^k{l)-Xk{l) 



In the end, we have so to consider the value of long-memory parameter d, the length of the series 
k and T to decide on a prediction method. 



4 Appendix 

4.1 Proof of Lemma 12.1.11 

Let g be the function (/, j) ^ j--rf-i+<5^-rf-i+<5|^ _ j]2d-i+5 ^ j^^^ ^ ^ integers. We 

assume that 5 < 1 — 2d and that m > ^^2d-i ^ ^ 

[n, n + 1] X [m, m+1]. Ifn>m- + 1 then 




9il',j)djdl > g{n + l,m). 



0, 



(5+2d-l 



We introduce An^m the square 
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Proof. We restrict the domain of g to the square A^^m- First we will show that g{-,j) is a decreasing 
and then we compute its derivative: 

{g{j, .))' (/) = [i-d - 1 + 6)r' + {2d-l + 6){l- j)-'] j-d-i+^i-d-i+s^i _ 
< 

since 5 < 1 — 2d. We show then that g{l, .) is increasing: 

{g{., I))' (j) = [i-d - 1 + 6)j-' -(2d-l + 5){l- 3)-^] - jf 



> 



because 



J > 



6-d-l 



S + 2d-l' 

Then the function g attains its minimum at (n + 1, m) and we have 

y{l,j)^An,m,g{l,j) > g{n + l,m) 



5(^j)djd/ > g{n + l,m). 



The results follows. 



□ 



4.2 Proof of Theorem [3XT] 

By assumption, the process (l^)nGZ is Gaussian. We also assume that its autocovariance function 
a verifies: 

~ Ai^'^"^ with A > 0, 
that the coefficients of its moving average representation bj are such that: 

bj ~ with 6>0, 

and that the white -noisG scries (£^n)nGZ hcis finite fourth, nionients. Let gi j be the function; 

{xo,...,Xk) ^ iUi - ai,k){yj - aj^k) 



with 



/ yi 

\ Vk 



( Xl 

Xi Xo 



Xk 
Xk-l 



\ Xk Xk-l ■■■ Xq J 





( Xl\ 




\ Xk ) 
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Therefore 



E{gij{a{0),a{l),...,a{k))) 

2 



l-Etiar,.) ^^(S,-^lMS,-^),„, + 0(n-3/2) ifd=l/4 



n 



where C and D are constants and independent of n and /c. The definition of matrix H follows. We 
define h as h{\) = |1 — Yl!r=i o-r,k^"'^\'^ and we denote by /i^^^ the derivative of the function h with 
respect to ar^k- The (i, j)-th entry of the matrix H is given by: 



h^'\\)h^^\\)f{\)d\ 



(42) 



Proof . We write a 2°^^ order Taylor expansion of the function gij applying Theorem 5.4.3 in I Fuller 
(|l976l l as in the Section [2.2i We will refer to the following version.: 

(i) liW.(\^)-a{k)A =0{an); 



(ii) if (7ij is uniformly bounded; 

(iii) if the first and the second derivatives of gi^j are continuous and bounded functions on a 
neighbourhood of ((t(0), . . . , (T{k)) 



then 



¥.Uj{a{Q),a{l),...,a{k)) 



^^g^,, (^(o),...,^(A;))E((a(0-a(/))(^M-^M))+0(a„). 



4ee 



2 ^-^ ^-^^ dxidxm 

1=0 m=0 ' 



We first verify that the assumptions hold. We need a bound for the third order moments of the 
empirical covariances. 



Lemma 4.2.1. 



E 



n—k 



-Em 



t+k 



a{k) 



t=o 



'0(n-3/2) ifd<l/A 
0(j^6d-3) ifd>l/A 



Proof. Lemma 14.2.11 is proven in Section HT3 



(43) 



□ 



In this way we obtain a bound for the rest of the Taylor series. Moreover gij is an uniformly 
bounded function since its results are the coefficients of the autoregressive process. Since the 
derivatives of gij at (it(0), . . . , (j{k)) are finite, there exits a neighbourhood of (cr(0), . . . , a(k) ) such 
that o n this all the derivatives are uniformly bounded. So we apply the Theorem 5.4.3 in I Fuller! 
;!^irst we note that: 

gi,j{a{0),...,a{k)) =0 
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and 



vie IO,fc|, 



^W0)......W) 



^((7(0), . . .,a{k)){yj - Gj^k) 
dxi 



dxi 



(cT(0),...,a(A:)) 



(44) 



because 



Vi G 11, kj, yi{a{0),...,a{k))-a,^k = 0- 
From the Taylor series and Lemma I4.2.H it follows that: 

' Eto eLo a^(^(o)> • • • > mmi^) - <^m^) - ^M)) + o(n-3/2) 

if < d< 1/4 

Eto ELo a^(^(0)' • • • > ^(^))E((^(r) - ct(/))(^M - a(m))) + 0{n'''~') 
1/4 <d< 1/2 

According to the results from Hosking ( 19961 ). we shall compute the second term of the Taylor series. 
First we note that: 

(a(0), . . . , a{k)) = ——^{a{0), a{k)) + — --^(a(O), . . . , a{k)) 



dxidxm dxi dxm dxm dxi 

because the other terms vanish at (cr(0), . . . , (y{k)) by (j44|) . Moreover we can apply Hosking ( 19961 ): 



E((cT(0-cT(0)(a(m)-a(m))) 
Dn^^ ln(n) 



n— >+oo 
~ < 



if J < d<l 
if d=l 



n 



where C, and F are constants and independent of / and m. Consequently, we can compute: 

k k r. 



EE 



dxidxm 



-(ct(O), . . . , aik))E{{a{l) - a{l)){a{m) - a{m))). 



Z=0 m=0 

First we study the case d > 1/4 and we prove that: 

dy 



j:gKo),...,a(^))=(i-j: 



O^r-k 



r=l 



( ( 1 



V 1 



(45) 
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since if we define Cq := (o"(0), . . . , ^(/c)), we may write tlie partial derivative as: 



( 


o 








\ 


dxi y 


/ 


" / 




d 




dxi 


\ 


. \ 



Xk-l 



k 



Using (j45p . the result follows because: 

k k 



EE 

l=Q m=0 



dxidxr 



■(a(0), . . . , a(A:))E((a(0 - a(0)(cT(m) - a{m))) 



k \ ^ 

1-Er=iar,fe) -Dn"Mn(n) (E^4fe,fcE^^), .s if d 



if 3 < d < i 



When d < 1/4, we first notice by using (j46p that: 



Then it follows that: 

k k 



V <T{k) 



EE 

i=0 m=0 



dgi,j 
dxidxy 

dyi dyj 



-(a(0), . . .,a{k)Miail) - c7(/))(c7(m) - (7(m))) 



EESf^ E K«M« + ^-m) + a(.M. + / + m)) 

i=0 m=0 '■ ^ 



k k 



- 

2 ^ ^„ 9xj ax„ 

1=0 m=0 ' " 



-oo 
oo 



(46) 



((T(s)cr(s + / — m) + cr(s)(T(s + m — /) + cr(s)cr(s + Z + m) + cj(s)cj(s 



k k 



s=— oo 
oo 



-EE 1^1^ E ^(^) r/(A)e*^^(e^('-'")Ve*("^-')Ve*(™+')Ve*(-'"-')^M^ 



i=0 m=0 s=-oo 



2EE|f^ E ^(-) r/(A)e-^cos(/A)cos(mA)dA 

Z=0 m=0 ' s=-oo -^^^ 



k k 



2 r ^ e-V(.)/(A) ^ ^ 



dxi dxm 



cos(ZA) cos(mA)dA 



k k 



2/ /(A)^EE|f^-^(^^)-«(-^)dA 



Z=0 m=0 



I (JXm 



2 (Sfc ^i^E-i) 
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with H defined in (Ii2l). 



□ 



4.3 Proof of Lemma 14.2.11 

Show that: 



E 



1 



I n 



n—k 



Y^XtXt+k-^yik) 



t=i 



'0(n^3/2) ifd<l/4 
0(j^6d-3) ifd>l/4 



(47) 



Proof. 



E 



n—k 



\-y^xtXt+k-a{k) 



< 



E 



n~k 



\-y^XtXt+k-(r{k) 



t=i 



E 



n—k 



\-y^XtXt+k-(r{k) 



We will separately consider the two terms. First we have: 



E 



n—k 



t=l 



n—k 



''n—k n—k 



\^^XtXt+k-^{k)^ = a{kf-2aik)-ElY,XtXt+A+^^{^XtXt+kY.^^^^+>' 



s=l 



Since the process is Gaussian, we have (see Triantafvllopoulo^ ( 2003 )): 

E {XtXt+kXsX,+k) = E {XtXt+k)^ (X,X,+fc) + E {XtX,) E {Xt+uXs+k) + E {XtX,+k) E (X^+^X,) ; 



and thus: 



E 



n—k 



l-Y^XtXt+k-cyik) 



(n — A;)^ n — k 



n 

n—k n—k 



2 + 1 a{k) 



+~2 ~ + cr{t + k- s)a{s + k - t) 

" t=l s=l 

^ n—k n—k 

—Tikf + — JZIZ (^(* ~ + cT(t + A; - s)a{s + k - t)) 



t=l s=l 



We note that 



n—k n—k 



n—k 



^^a(t-s)2 = {n-k)a{Qf +2Y^{n-k-t)a{tf 



t=l s=l 



t=l 

n—k 



n—k 



0(n) + {n-k)Y^ a{tf - 2 ^ ta{tf 



t=i 



0(n) + 0(n 



In a similar way for: 



n— A; n—k 



^ ^ cr(t + - s)cr(s + A; - t) 



t=l s=l 
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we obtain that: 



E 



n—k 



\-y^XtXt+k-a{k) 



'0(n-i/2) ifd<i/4 
0(n2'^-i) if(i>l/4 



(48) 



For the second term, we have: 



E 



\-y^xtXt+k-cT{k) 



t=i 



a{k)^ - 4a(A;)E 



n—k 



t=l 



/ n—k 



n 



■E ^X^Xt+fc +E 



+ 



n—k 



&a{kf 



■E 



n—k 



Since the process is Gaussian, we can apply the result in iTriantafvlloDOulosI (|2003l l and develop the 
moments as functions which depend only on the covariances of the process. Then we count the 
order of (7{k) in each term of the sum. The coefficient of (y{k)^ is: 



1 



4(n - A;)^ 6(n - k)^ 4(n - k) {n - k) 



+ 



+ 



n 



the coefficient of cr{k) is: 

/ n—k n—k 



^ ^ cr(t - + a{t + k- s)a{s + k - t) 



a=i s=l 



-I2{n-k) 6 Q{n-ky 



^n—k n—k 



^ ^ (T(t - s)^ + a{t + k- s)a{s + k - t) 



,t=l s=l 
-3\ 



0{n--^) if(i<l/4 
Q(^-4+4d) ifd>i/4 



and the coefficient of cr{k) is: 

n—k n—k n—k 



— 6(T(t — s)a{r — s)a{r — t + k) + a{t + k — r)a{s + k — r)a{r + k — s) 



t=l s=l r=l 

-4 4(n - k) 
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We study this asymptotic behaviour as follows: 

n—k n—k n—k 



E E E - ^)^(^ - *)^('^ - * + ^) 
t=l s=l r=l 

g n — k n — k n—k 

t=l s=l r=l 
fn—k i-n—k /"n—k 

~ ^ / / / li-.sP'^-Mr-.s|2«'-i|r-t + fcP'^-MMsdr 



pn—K pn—K pn—K 

./ / / lt-s\'"'-\-s\'"'-'\r-t + kf''-\ 
Jl Ji Jo 

f? pn pn pn pn 

< ^J^ 1^ 1^ \t-s\"'-'\r-s\"'-'\r-t\^''-'dtdsdr 
~ en^-^-^ ['\t-s\'"'-'\r-sf''-'\r-t\'"'-'dtdsdr 



^0 JO 



The factor of a{k) is bounded by 0(ra^'^~^). The constant terms are either like: 



li — k n — k 71 — k n — k 



m m m XI ~ *)^(^ ~ ^)^(* ~ ^)^(^ ~ ^) 



n 

t=l s=l r=l D=l 



According to a comparison with an integral, they are bounded by 0(n ), or they are like: 

n—k n—k n—k n—k 



^ II, — n. II, — n, II, — n, 1 1, n, 

t=l s=l r=l v=l 

We separate the two sums and Tising the previous results we obtain that: 

^ n—k n—k n—k n—k ( r^/ _9\ t- ? ^ t ia 

When we sum the different components, we obtain that: 



E 



-J n — k 

-y^xtXt+k-c7{k) 



Finally, we have obtained that: 

n—k 

I n 



E 



^ n—k 



t=l 



O(n-i) ifd<l/4 
0(n^'^-2) ifd>l/4 



'0(n-3/2) ifd<l/4 
0(ra6'^-3) if d > 1/4 



(50) 



(51) 
□ 
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